Skip to content

Fix basic usage pencil initialization#140

Merged
romerojosh merged 1 commit into
NVIDIA:mainfrom
fallintoplace:fix/basic-usage-initialize-pencil
Jun 16, 2026
Merged

Fix basic usage pencil initialization#140
romerojosh merged 1 commit into
NVIDIA:mainfrom
fallintoplace:fix/basic-usage-initialize-pencil

Conversation

@fallintoplace

Copy link
Copy Markdown
Contributor

Summary

This fixes the CUDA initialize_pencil example kernel used by the basic usage examples.

The kernel computes a flattened pencil element index as l, but wrote initialized values through data[i]. That means threads with the same first-dimension coordinate race on the same buffer slots, while most of the pencil allocation is left untouched.

This PR updates the device initialization path to store through data[l], matching the host-side flattened loop already shown in the example. It also tightens the guard to l >= pinfo.size so the extra threads from the rounded-up launch do not write past the valid pencil range. The documentation snippet is updated to match the source examples.

Validation

  • git diff --check
  • Searched the touched example/docs scope to confirm the old data[i] store and l > pinfo.size guard are gone

I could not build or run the CUDA/MPI example locally because this machine does not have nvcc or mpicxx installed.

Signed-off-by: Minh Vu <vuhoangminh97@gmail.com>

@romerojosh romerojosh left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the contribution and good catch @fallintoplace! Changes LGTM!

@romerojosh romerojosh merged commit 61e5488 into NVIDIA:main Jun 16, 2026
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants